[Python] How do I read binary pickle data first, then unpickle it?

Posted by conradlee on Stack Overflow See other posts from Stack Overflow or by conradlee
Published on 2010-05-04T09:47:09Z Indexed on 2010/05/04 9:58 UTC
Read the original article Hit count: 276

I'm unpickling a NetworkX object that's about 1GB in size on disk. Although I saved it in the binary format (using protocol 2), it is taking a very long time to unpickle this file---at least half an hour. The system I'm running on has plenty of system memory (128 GB), so that's not the bottleneck.

I've read here that pickling can be sped up by first reading the entire file into memory, and then unpickling it (that particular thread refers to python 3.0, which I'm not using, but the point should still be true in python 2.6).

How do I first read the binary file, and then unpickle it? I have tried:

import cPickle as pickle
f = open("big_networkx_graph.pickle","rb")
bin_data = f.read()
graph_data = pickle.load(bin_data)

But this returns:

TypeError: argument must have 'read' and 'readline' attributes

Any ideas?

© Stack Overflow or respective owner

Related posts about python

Related posts about serialization